Rule-based Approach for Arabic Root Extraction: New Rules to Directly Extract Roots of Arabic Words

نویسندگان

  • Fatma Abu Hawas
  • Keith E. Emmert
چکیده

Extracting word roots in Arabic language is very problematic due to the specific morphological and structural changes in the language. To address this problem, several techniques have been proposed. This paper continues the problem of identifying and exploiting relationship amongst Arabic letters for Arabic root extraction begun in [1]. Eight different rules that detect the root letters according to other letters in the word have been proposed and tested, four of them benefiting from the idea of morphological substitution (MUTATION). The approach has been evaluated using the Holy Quran words. The evaluation results show a promising root extraction algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a new Approach for Arabic root extraction: Exploit relations between the word letters and their placement in the word for Arabic root extraction

This paper presents a new root-extraction approach for Arabic words. The approach tries to assign for Arabic words a unique root without relying on a database of word roots, a list of word patterns or a list of all the prefixes and the suffixes of the Arabic words. Unlike most of Arabic rule-based stemmers, it tries to predict the root-letters positions one by one based on some rules and relati...

متن کامل

Extracting the roots of Arabic words without removing affixes

Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring ...

متن کامل

Nahla A Belal An Efficient Rank Based Arabic Root Extractor

Nahla A Belal An Efficient Rank Based Arabic Root Extractor A morphologically-rich language such as Arabic requires deep analysis this is due to its invaluable characteristics which are beneficial for the task of root extraction. This paper investigates employing new techniques to enumerate and rank possible roots for a given word, using linguistic rules as scoring mechanisms. The proposed tech...

متن کامل

A Genetic-Based Extensible Stemmer for Arabic Verbs

Firstly we covered the problem of rule definition for x-fixing Arabic roots. Instead of the traditional approach that relies on the semantics conveyed by x-fixing, we based our approach on the lexica. Presented in this paper an extensible schema for rules definition, which we used to partially cover the cases of verbs produced by preand suffixing triliteral roots. We then present a stemming sys...

متن کامل

Unsupervised Induction of Arabic Root and Pattern Lexicons using Machine Learning

We describe an approach to building a morphological analyser of Arabic by inducing a lexicon of root and pattern templates from an unannotated corpus. Using maximum entropy modelling, we capture orthographic features from surface words, and cluster the words based on the similarity of their possible roots or patterns. From these clusters, we extract root and pattern lexicons, which allows us to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CIT

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2014